-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically select packing ratio #622
Conversation
9285c85
to
6d53fca
Compare
collate_fn=collate_fn, | ||
batch_size=dataloader_batch_size, | ||
drop_last=cfg.drop_last, | ||
# sampler=dist.get_sampler(dataset, # TODO why was this not used in the first return in the original code? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: add back in
for _, leftover in self.collator._leftover_bins: | ||
yield leftover | ||
|
||
class BinPackCollator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: remove this class altogether in favor of BinPackDataset, logic from call should be moved to BinPackDataset class
'attention_mask', | ||
'bidirectional_mask', | ||
] | ||
|
||
# Cut everything down to size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment
size, trimmed_example = extract_trim_batch_idx(batch, idx) | ||
sizes.append(size) | ||
trimmed_examples.append(trimmed_example) | ||
sizes = [len(example['input_ids']) for example in examples] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we assume that we no longer need to trim examples if we pack at the dataset level?
Are datasets always unpadded?
# if k == 'sequence_id': | ||
# example[k] = torch.cat( | ||
# [example[k], add_on[k] + 1 + torch.max(example[k])]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: add this back in.
# min_ratio = 2 | ||
# max_ratio = 2 | ||
# num_packing_ratios = 1 | ||
# profiling_results = profile_packing(dataloader_cfg, tokenizer, min_ratio, | ||
# max_ratio, num_packing_ratios, | ||
# device_batch_size) | ||
|
||
# # Obtain the maximum packing_ratio/minimum padding that has no waste. | ||
# i = 0 | ||
# waste = 0 | ||
# packing_ratio = 1 | ||
# while i < len(profiling_results) and waste == 0: | ||
# packing_ratio, _, waste = profiling_results[i] | ||
# i += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uncomment and update min/max ratios as appropriate
probably gonna go for something like max_ratio = max_seq_len / 100
and num_packing_ratios = 15
return batches | ||
|
||
def profile(raw_batch_size: int) -> Tuple[float, float]: | ||
packer = BinPackCollator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace in favor of BinPackDataset
assert packed_samples[1] == [7] * 7 | ||
assert packed_samples[2] == [6] * 6 | ||
|
||
# def test_auto_packing(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add test for full auto packing flow
closing because we decided to stick with the collator version. This may cause small amounts of waste in practice, but is much simpler to implement and maintain. |
Manual test
finetune-auto-pack-BAUz9w https://wandb.ai/mosaic-ml/irene-test/runs/e3pc1puh
finetune-auto-pack-baseline-lvmog9 https://wandb.ai/mosaic-ml/irene-test/runs/vdxwzlxg